3574 results found.
Written
Corpus,
Language Type:
Multilingual
Languages:
Czech English French German Spanish Swedish
Availability:
Freely Available
License:
CreativeCommons
Size:
7 GByte Production Status:
Existing-used
Use:
Information Extraction, Information Retrieval
-
Paper title:Document Translation vs. Query Translation for Cross-Lingual Information Retrieval in the Medical Domain
-
Paper track:Long/Information Retrieval and Text Mining
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Shadi Saleh | Extended CLEF eHealth 2013-2015 IR Test Collection | /N |
Documentation:
None
Written
Corpus,
Language Type:
Multilingual
Languages:
Czech English French German Hungarian Polish Spanish Swedish
Availability:
Freely Available
License:
CreativeCommons
Size:
2 MByte Production Status:
Existing-used
Use:
Information Extraction, Information Retrieval
-
Paper title:Document Translation vs. Query Translation for Cross-Lingual Information Retrieval in the Medical Domain
-
Paper track:Long/Information Retrieval and Text Mining
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Shadi Saleh | Khresmoi Summary Translation Test Data 2.0 | /N |
Documentation:
None
Written
Treebank,
Language Type:
Monolingual
Languages:
English
Availability:
License:
LDC
Size:
None Production Status:
Existing-used
Use:
Parsing and Tagging
-
Paper title:Tetra-Tagging: Word-Synchronous Parsing with Linear-Time Inference
-
Paper track:Short/Syntax: Tagging, Chunking and Parsing
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Nikita Kitaev | Penn Treebank | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Bilingual
Languages:
English Mandarin Chinese
Availability:
LDC
License:
LDC
Size:
None Production Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:Meta-Transfer Learning for Code-Switched Speech Recognition
-
Paper track:Short/Speech and Multimodality
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Genta Indra Winata | Mandarin-English Code-Switching in South-East Asia | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
CreativeCommons
Size:
1488 hours Production Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:How Accents Confound: Probing for Accent Information in End-to-End Speech Recognition Systems
-
Paper track:Long/Speech and Multimodality
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Preethi Jyothi | Mozilla Common Voice | /N |
Documentation:
Meta data is available along with the dataset
Speech
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
CreativeCommons
Size:
1000 hours Production Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:How Accents Confound: Probing for Accent Information in End-to-End Speech Recognition Systems
-
Paper track:Long/Speech and Multimodality
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Preethi Jyothi | Librispeech | /N |
Documentation:
Yes a paper is available as documentation in English: https://ieeexplore.ieee.org/document/7178964
Speech
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
From Data Center(s)
License:
LDC
Size:
5.4 hours Production Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:How Accents Confound: Probing for Accent Information in End-to-End Speech Recognition Systems
-
Paper track:Long/Speech and Multimodality
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Preethi Jyothi | TIMIT Acoustic-Phonetic Continuous Speech Corpus | /N |
Documentation:
Yes, documentation is available in English along with the dataset
Speech
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
Size:
None hours Production Status:
Existing-used
Use:
Acquisition
-
Paper title:Learning to Understand Child-directed and Adult-directed Speech
-
Paper track:Short/Cognitive Modeling and Psycholinguistics
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Lieke Gelderloos | NewmanRatner Corpus | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
Size:
10 MByte Production Status:
Existing-used
Use:
Pseudo code
-
Paper title:Semantic Scaffolds for Pseudocode-to-Code Generation
-
Paper track:Long/NLP Applications
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Ruiqi Zhong | Search-based Psuedocode to Code | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
Apache II
Size:
5.3 MByte Production Status:
Newly created-finished
Use:
Document Classification, Text categorisation
-
Paper title:Toxicity Detection: Does Context Really Matter?
-
Paper track:Long/NLP Applications
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | John Pavlopoulos | Context Aware Toxicity (CAT) | /N |
Documentation:
Publicly available readme file in English.




